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ABSTRACT 

We report on a spectral principal component analysis (SPCA) of a sample of 816 quasars, o selected to have 
small Fe II velocity shifts with spectral coverage in the rest wavelength range 3500-5500 A. The sample is 
explicitly designed to mitigate spurious effects on SPCA induced by Fe II velocity shifts. We improve the 
algorithm of SPCA in the literature and introduce a new quantity, the fractional-contribution spectrum, that 
effectively identifies the emission features encoded in each eigenspectrum. The first eigenspectrum clearly 
records the power-law continuum and very broad Balmer emission lines. Narrow emission lines dominate the 
second eigenspectrum. The third eigenspectrum represents the Fe II emission and a component of the Balmer 
lines with kinematically similar intermediate velocity widths. Correlations between the weights of the eigen- 
spectra and parametric measurements of line strength and continuum slope confirm the above interpretation for 
the eigenspectra. Monte Carlo simulations demonstrate the validity of our method to recognize cross talk in 
SPCA and firmly rule out a single-component model for broad H/3. We also present the results of SPCA for 
four other samples that contain quasars in bins of larger Fe II velocity shift; similar eigenspectra are obtained. 
We propose that the H/3-emitting region has two kinematically distinct components: one with very large veloci- 
ties whose strength correlates with the continuum shape, and another with more modest, intermediate velocities 
that is closely coupled to the gas that gives rise to Fe II emission. 

Subject headings: line: profiles — methods: data analysis — methods: numerical — methods: statistical — 
quasars: emission lines — quasars: general 



1. INTRODUCTION 
1.1. H/3 Broad-line Region 

The structure of the broad-line region (BLR) in active 
galactic nuclei (AGNs) is still poorly understood. A widely 
accepted concept, predicted from photoionization models 
(Collin-Souffrin & Lasota 1988) and supported by reverber- 
ation mapping observations (e.g., Peterson & Wandel 1999), 
is that the BLR is radially stratified: high-ionization lines are 
emitted from smaller radii than low-ionization lines. High- 
ionization lines such as C IV are thought to be emitted, at least 
in part, from an outflow (see Richards et al. 2011, and refer- 
ences therein), while low-ionization lines such as H/3 origi- 
nate from a virialized region. It is the virialized component 
that is pertinent to efforts to use the BLR to estimate the mass 
of the central black hole (BH). However, velocity-resolved 
reverberation data from recent monitoring programs indicate 
that H/3-emitting region is more complicated than previously 
thought; depending on the object, infall, outflow, and virial- 
ized motions are all possible (e.g., Bentz et al. 2010; Denney 
et al. 2010, and references therein). 

The profile of the broad H/3 line also points to the com- 
plexity of the H/3-emitting region. It generally cannot be well 
described by a single Gaussian. Two Gaussians (e.g., Netzer 
& Trakhtenbrot 2007; Hu et al. 2008a) or a Gaussian-Hermite 
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function (e.g., Salviander et al. 2007; Hu et al. 2008b) are of- 
ten used for quasars, while a Lorentzian, a Lorentzian plus a 
very broad Gaussian (e.g., Veron-Cetty et al. 2004), or two 
Gaussians (e.g., Mullaney & Ward 2008) are used for narrow- 
line Seyfert 1 galaxies. In addition, the H/3 profile shows 
great diversity from object to object (e.g., Hu et al. 2008a; 
Zamfir et al. 2010, and references therein). Some previous 
studies (e.g., Brotherton 1996; Sulentic et al. 2000b) propose 
a two-component model for H/3 emission, an intermediate- 
width component and a very broad component. Netzer & 
Marziani's (2010) calculations of the line profile rule out sim- 
ple single-zone models. 

Hu et al. (2008a,b) systematically investigated Fe II and 
H/3 emission in a large sample of quasars selected from the 
Sloan Digital Sky Survey (SDSS; York et al. 2000). They 
found that Fe II emission originates from an intermediate- 
velocity region, located farther out from the center, whose dy- 
namics may be dominated by infall. The broad H/3 line can 
be decomposed into two physically distinct components, one 
associated with the conventional BLR and another with the 
intermediate-velocity region identified through Fe II. Ferland 
et al. (2009) calculated the outward emission from infalling 
clouds and found that such a scenario can reproduce the ob- 
served Fe II emission. These studies demonstrate that fruitful 
insights can be gained from a self-consistent investigation of 
the kinematics of different emission lines. 

However, all the above work suffers from a major draw- 
back: the derived parameters for both the continuum and the 
emission lines are model-dependent. The profiles of the emis- 
sion lines are not necessarily well represented by the simple 
analytical functions (Gaussians, Lorentzians, Gauss-Hermite 
polynomials, or various combinations thereof) that are com- 
monly used to fit them. Spectral principal component analy- 
sis is an alternative, model-independent approach that can be 
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used to study emission-line profiles. This method makes use 
use of all the emission features in the spectra, and it is well- 
suited for application to large data sets, such as that afforded 
bySDSS. 

1.2. Spectral Principal Component Analysis 

Principal component analysis (PCA) is a powerful mathe- 
matical tool used to reduce the dimensionality of a data set. 
It describes the variation in the data set by the fewest number 
of variables, called principal components (PCs). The method 
has been widely used for many purposes in many areas of 
astronomy, including studies on AGNs. The most common 
implementation is to apply it to a set of measured variables 
and seek correlations among them. Boroson & Green (1992, 
hereinafter BG92) applied it to the Palomar-Green sample of 
low-redshift quasars and discovered that the bulk of the vari- 
ance in the sample is dominated by the inverse correlation 
between the strengths of the Fe II and [O III] lines, commonly 
called Eigenvector 1 (EV1). 

Although applying PCA to a set of measured variables is 
suitable for multivariate correlation analysis, it has the short- 
coming that measuring the input variables is itself model- 
dependent. The parameters of emission lines are often mea- 
sured from fits using mathematically convenient functions. 
Some parameters, such as the shape and asymmetry, can be 
quantified without fitting the line, but fitting the continuum 
and Fe II emission first is always necessary. This shortcom- 
ing can be mitigated by spectral principal component analysis 
(SPCA). 

In SPCA, originally developed by Francis et al. (1992), in- 
stead of using measured variables, the fluxes in each wave- 
length bin are used as input variables. SPCA obviates the 
need to model the continuum, the pseudocontinuum (due to 
blended Fe II emission), or the line profiles. The resultant 
PCs are linear combinations of the fluxes in each wavelength 
bin, so have the form of spectra and are represented as eigen- 
spectra hereafter. SPCA has been used very successfully in 
classification (e.g., Francis et al. 1992; Yip et al. 2004b), in 
establishing empirical templates and reconstruction (e.g., Hao 
et al. 2005; Boroson & Lauer 2010), and in exploring "out- 
liers" (e.g., Boroson & Lauer 2010). It has also been widely 
adopted to study the physics of quasars, by means of inter- 
preting the first few eigenspectra. Table 1 summarizes pre- 
vious SPCA studies on AGNs, including their parent sample, 
number of sources, wavelength range of the eigenspectra, and, 
most importantly, their sample selection criteria, which affect 
the results. Table 2 lists the interpretations of the first three 
eigenspectra. 

A disadvantage of using SPCA to explore the physics of 
AGN emission stems from the fact that SPCA is a linear anal- 
ysis, while there are many nonlinear factors in the distribution 
of samples and also in the variance of quasar spectra. These 
nonlinear factors render very difficult simple physical inter- 
pretation of the eigenspectra. Many eigenspectra presented in 
the literature do not show clear and clean features. As listed in 
Table 2, some interpretations have ambiguous physical mean- 
ing, and many vary from study to study. 

In the present paper, the first in a series, we aim to de- 
rive a set of eigenspectra of quasar optical spectra that has 
clear physical meaning, by considering the nonlinear factors 
described in §2.1, especially concerning the diversity of Fe 
II velocity shifts. Our samples are established in §2.2. In 
§3, we briefly describe our SPCA algorithm and the defini- 



tion of a new quantity, the fractional-contribution spectrum, 
which helps to understand the eigenspectra; more details are 
given in the Appendices. For the sample of quasars with small 
Fe II velocity shift, Section 4 presents the eigenspectra we 
obtain, their physical interpretation, correlations between the 
weights and spectral measurements, bootstrap study, and fit- 
ting of eigenspectrum 3. We test our SPCA method and in- 
terpretation using Monte Carlo simulations in §5. Section 6 
presents the results for four other quasar samples with larger 
Fe II velocity shifts. The implications of our results are dis- 
cussed in §7, with a summary given in §8. 

In the second paper of this series, we will explore the ap- 
plication of eigenspectrum 3 as a template for Fe II and for 
the intermediate-width component of the Balmer lines. The 
two-component structure of the H(3 BLR suggested by our 
SPCA results is relevant to BH mass estimates using the Y\j3 
line width. This issue will be investigated in a third paper. 

2. SAMPLE SELECTION 

We use the sample defined in Hu et al. (2008b) as the parent 
sample. It is the largest sample to date that has measurements 
of Fe II velocity shifts. The parent sample was selected from 
the SDSS Fifth Data Release (DR5; Adelman-McCarthy et 
al. 2007) quasar catalog (Schneider et al. 2007) according to 
a series of criteria that ensure that reliable measurements can 
be obtained for Fe II emission. Briefly, the selection criteria 
include: (1) redshift z < 0.8 and signal-to-noise ratio (S/N) > 
10 in the wavelength range 4430-5550 A; (2) \ 2 < 4 in their 
continuum decomposition; (3) equivalent width (EW) of Fe II 
> 25 A; (4) the H/3 broad component FWHM errors < 10% 
and [O III] A5007 peak velocity shift errors < 100 km s" 1 . 

2.1. Nonlinear Factors in SPCA 

Many nonlinear factors in SPCA have been studied in the 
literature. Variable line width is one of them. The line cen- 
ter and wing have opposite responses when varying only the 
line width. Its influence on SPCA has been studied exten- 
sively using simulations (e.g., Mittaz et al. 1990; Shang et al. 
2003). It produces the characteristic Fourier-like "W" shape 
on eigenspectra, a feature that has been seen in the majority 
of SPCA studies on quasar spectra. Another nonlinear factor 
that contributes to the variance of quasar spectra is the vari- 
able slope of the power-law continuum. Shang et al. (2003) 
studied the influence of the continuum slope and found that it 
introduces cross talk among eigenspectra, which manifests as 
the appearance of a spectral feature (e.g., an emission line) on 
an eigenspectrum with which it does not correlate. 

The nonlinear effects caused by these variable parameters 
can be mitigated by judiciously selecting subsamples in which 
these parameters (e.g., Hf3 line width) span a narrow range 
of values. However, as discussed in §1.1, the Hf3 profile is 
too complicated to be described by a single width parameter. 
Moreover, one of our primary goals is to study the H/3 profile 
itself, and it would be counterproductive to restrict our sample 
using the very parameter we wish to investigate. We could 
attempt to subtract the power-law continuum first, but this, 
too, is not ideal because the continuum cannot be measured 
in a model-independent manner, thereby compromising the 
advantages to be gained by SPCA. 

The spectra of quasars also vary in the velocity shifts of 
emission lines. Broad quasar emission lines often show con- 
siderable velocity shifts with respect to narrow emission lines 
(Gaskell 1982). The velocity shift varies from source to 
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Table 1 

The Samples in Previous SPCA Studies 



Notation 


Paper 


Parent 
sample 


Number of 
sources 


Wavelength 
range (A) 


Notes on sample selection 


F92 


Francis etal. (1992) 


LBQS 


232 


1150-2000 


Wavelength coverage 


S03A 


Shang et al. (2003) 


BQS 


22 


4000-5500 


Low redshift and Galactic absorption 


S03B 


Shang et al. (2003) 


BQS 


22 


1171-6608 


Same as above 


Y04 


Yip et al. (2004b) 


SDSS 


16707 


900-8000 


All in SDSS DR1 quasar catalog 


L09ALL 


Ludwig et al. (2009) 


SDSS 


9046 


4000-6000 


FWHM(H/3 BC ) > 2000 km s _1 


L09S1 


Ludwig et al. (2009) 


SDSS 


6317 


4000-6000 


Low EW([0 III]) 


L09S2 


Ludwig et al. (2009) 


SDSS 


2307 


4000-6000 


Intermediate EW([0 III]) 


L09S3 


Ludwig et al. (2009) 


SDSS 


422 


4000-6000 


High EW([0 III]) 


BIO 


Boroson & Lauer (2010) 


SDSS 


1039 


4000-5700 


Wavelength coverage and spectral quality 


H12 


This work 


SDSS 


816 


3500-5500 


Small Fe II velocity shifts 



Table 2 

Eigenspectra Interpretations in Previous SPCA Studies 



Notation 




Interpretations of Eigenspectra 






1st 


2nd 


3rd 



F92 


Emission-line cores 


Continuum slope 


Broad absorption lines 


S03A 


BG92's EV1 


B aimer lines 




S03B 


Emission-line cores 


Continuum slope 


Line-width relationships 


Y04 


Host galaxy component" 


Continuum slope" 


Balmer emission lines" 


L09ALL 


Narrow emission lines 


Broad emission lines and continuum slope 




L09S1 


Broad emission lines and continuum slope 


Correlation among all emission 


BG92's EV1 


L09S2 


Broad emission lines and continuum slope 


Narrow emission lines 


Narrow emission-line shift 


L09S3 


Narrow emission-line EW 


Narrow emission-line shift or asymmetry 


Narrow emission-line width 


B10 b 


BG92's EVl a 






H12 


Continuum and very broad emission lines 


Narrow emission lines 


Intermediate-width emission lines 



" Yip et al. (2004b); Boroson & Lauer (2010) did not subtract a mean spectrum and thus their first eigenspectram is the mean spectrum. Their 
nth (n > 2) eigenspectra are labeled as (n- l)th in the present paper. 

b Boroson & Lauer (2010) did not aim to interpret the physical meaning of the eigenspectra, other than equating the first eigenspectrum with 
BG92'sEVl. 



source and from line to line (see, e.g., introduction section 
of Hu et al. 2008b for a brief review). Hu et al. (2008b) sys- 
tematically investigated the optical Fe II emission in a large 
sample of quasars selected from SDSS, and found that the ve- 
locity shifts of Fe II emission span over a wide range. The 
majority of quasars show Fe II emission that is redshifted by 
~ to 1000 km s" 1 , some up to 2000 km s" 1 , with respect 
to the velocity of the narrow-line region as traced by [O III] 
A5007 or [O II] A3727 (see Figure 7 of Hu et al. 2008b). 

Recently, Sulentic et al. (2012) called into question the 
measurement of Fe II velocity shifts in Hu et al. (2008b) 
by testing the composite spectra of some subsamples. They 
claimed that the Fe II emission in their composite spectra do 
not show significant 6 velocity shifts, and concluded that there 
is no evidence for a systematic Fe II redshift. They preferred 
the older conclusion that the velocity shift and width of Fe II 
follow those of H/3 in virtually all sources. However, the prop- 
erties of a composite spectrum depend on how the subsample 
is selected. Contrary to the composite spectra in Sulentic et 
al. (2012) defined by their 4D Eigenvector 1 formalism, the 
composite spectra of quasars in bins of different Fe II velocity 
shifts generated in Hu et al. (2008b) do show significantly red- 
shifted Fe II emission. Appendix A gives details. The results 
from composite spectra in Appendix A provide additional ev- 

6 One composite spectrum (B 1) in Sulentic et al. (2012) has a best-fit value 
of Fe II velocity shift of 730 km s -1 . They claimed that the shift is not distin- 
guishable from zero by adopting the criterion that X 2 /x^i„ should be larger 
than 1.24. Their criterion is much too severe than that derived from F-test of 
an additional term (free versus fixed Fe II velocity shift) in the fitting. See 
Appendix A for details. 



idence that the Fe II velocity shift measurements in Hu et al. 
(2008b) are reliable for the majority of sources. 

The velocity shift of the emission line is a promising non- 
linear factor, but one that has been overlooked in previous 
SPCA studies. Prior to any analysis, all the spectra have been 
deredshifted to their rest frame. If the sources have different 
intrinsic velocity shifts, the effect on SPCA is equivalent to 
a situation in which the sources have no velocity shifts but 
their spectra are deredshifted using the wrong redshifts. Both 
cases will confuse the resultant eigenspectra, rendering any 
simple interpretation impossible. This effect can be avoided 
if we restrict the variance of the velocity shifts in the sample 
to be small — much smaller than the widths of the broad emis- 
sion lines. As this condition is satisfied by the narrow lines 
and broad Balmer lines in the optical spectra of quasars, it is 
reasonable that previous SPCA studies did not take velocity 
shifts into consideration. Fe II emission, which had not been 
well studied, was simply assumed to have no large velocity 
shift. But, as mentioned above, the dispersion in Fe II emis- 
sion shifts is comparable to the full width at half-maximum 
(FWHM) of the line (median value ~ 2500 km s -1 ; Figure 8 
of Hu et al. 2008b). Thus, the velocity shift of optical Fe II 
emission cannot be ignored in SPCA. 

To summarize this subsection: the variance in Fe II emis- 
sion velocity shifts in quasars is an important nonlinear factor. 
It must be treated carefully in any SPCA study. To minimize 
the adverse effects of this nonlinear factor, we will construct 
subsamples in which Fe II emission has similar velocity shifts. 
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2.2. Our SPCA Sample 

We select the primary sample for SPCA from the parent 
sample using the following additional criteria. (1) We choose 
objects with z < 0.67 in order to obtain eigenspectra cover- 
ing the rest-frame wavelength range 3500-5500 A. The red 
end ensures that we include the Fe II emission redward of [O 
III] A5007 in the eigenspectra. (2) The velocity shifts of Fe 
II emission are in the range -250 to 250 km s" 1 , to ensure 
that the influence of this nonlinear factor can be suppressed. 
(3) Following Boroson & Lauer (2010), we select spectra that 
have no more than 100 pixels flagged as bad by the SDSS 
pipeline in the rest-frame wavelength range 3500-5500 A. 
The final primary sample for SPCA contains 816 sources, all 
of which have small Fe II velocity shifts. 

The Fe II velocity shift limits adopted in criterion (2) above 
follows that for subsample A in §4.6 of Hu et al. (2008b). 
Four other bins of Fe II velocity shift were use in Hu et al. 
(2008b) (see their Figure 12). In this paper, we also establish 
four other subsamples for SPCA, to explore the consequences 
of varying the velocity shift criterion, while keeping the other 
criteria unchanged. The number of sources in the other four 
subsamples are 794 (B, 250-750 km s" 1 ), 338 (C, 750-1250 
km s" 1 ), 203 (D, 1250-1750 km s" 1 ), and 110 (E, 1750-2250 
km s" 1 ). 

The results of SPCA for the four bins with large Fe II ve- 
locity shifts are presented only in §6. Unless otherwise noted, 
the SPCA results in this paper refers to the primary sample of 
816 quasars with Fe II velocity shift in the range -250 to 250 
km s" 1 (sample A). 

3. METHODOLOGY 

Except for employing the same algorithm to obtain the 
eigenspectra from the cross-correlation matrix, previous stud- 
ies were different in many aspects. Table 1 lists the main dif- 
ferences among previous studies, including sample selection 
and wavelength coverage. These are the main reasons that 
they obtained different eigenspectra and interpretations. Ad- 
ditional differences arise from different implementation of a 
number of technical details in the analysis, such as whether 
the mean spectrum is subtracted or not, how the spectra are 
normalized, and how to deal with noise and gaps in the data. 

The present work incorporates two major improvements. 
(1) We isolate a sample of quasars with a narrow range of Fe II 
velocity shifts to minimize nonlinear effects in SPCA. (2) We 
introduce a new quantity, called the fractional-contribution 
spectrum; this parameter will turn out to be useful in under- 
standing the physical meaning of the eigenspectra. Our tech- 
nique adopts a slightly different method of normalizing the 
spectra compared to previous studies. 

We apply SPCA to the 816 quasar spectra selected above. 
The process of constructing the cross-correlation matrix 
mainly follows Francis et al. (1992). In detail, the steps are as 
follows. 

1 . Each spectrum is corrected for Galactic extinction using 
the M-band extinction listed in Schneider et al. (2007) and the 
extinction law of Cardelli et al. (1989) and O'Donnell (1994). 
We then shift the spectrum to its rest frame using the redshift 
given in Hu et al. (2008b), which is based on [O III] A5007. 
Although [O III] can be blueshifted, we adopt it to define the 
zero point of the velocity because the measurements of Fe II 
velocity shift were calculated with respect to [O III] in Hu et 
al. (2008b) (see their §3.5 for details). After deredshifting, 



all the spectra are rebinned to fixed wavelength bins, which 
are equally divided in logarithmic space with a dispersion of 
dX/X = 10" 4 , covering the range 3500-5500 A. The disper- 
sion is selected to be equal to that of SDSS spectra, and the 
wavelength range is covered by all the spectra in our sample. 
There are 1962 wavelength bins for each rebinned spectrum. 

2. The rebinned spectrum is normalized to unity average 
flux density over the wavelength interval 5075-5125 A. Nor- 
malization by the scalar product of the spectrum (e.g., Yip 
et al. 2004b; Boroson & Lauer 2010) or by the integrated 
flux (e.g., Francis et al. 1992; Shang et al. 2003; Ludwig et 
al. 2009) were adopted in previous work. We tested these 
three alternative methods of normalization and found that the 
derived eigenspectra are similar (see also the test in §3.2 of 
Connolly et al. 1995). However, we found that the normal- 
ization method adopted here (to have the unity flux density 
at 5100 A) produces more meaningful fractional-contribution 
spectrum (see Appendix C for the definition), which helps us 
to interpret the physical meaning of the eigenspectra. Fol- 
lowing Francis et al. (1992, see their §3.2), we do not scale 
the flux values for the sample in each wavelength bin to unity 
variance. 

3. We calculate the mean spectrum of the normalized spec- 
tra using all the good pixels and then subtract it from each of 
the original spectra. Some previous studies do not subtract 
the mean spectrum (e.g., Yip et al. 2004b; Boroson & Lauer 
2010). In that case, the first eigenspectrum is equal to the 
mean spectrum, and their nth (n > 2) eigenspectrum should 
be compared with the (n- l)th eigenspectrum in the present 
paper (see also Table 2). 

The above steps result in a mean-subtracted flux matrix F, 
whose element is the flux of the z'th mean-subtracted spec- 
trum in the jth wavelength bin. The goal of SPCA is to calcu- 
late the eigenvalues and eigenvectors of the cross-correlation 
matrix F 7 " • F, where F T denotes the transpose of matrix F. 
This is achieved using singular value decomposition. The it- 
eration method of Yip et al. (2004a) is adopted to deal with 
the noise and bad pixels in the spectra. Then, the mean- 
subtracted spectra are reconstructed as F = W • V T , where V 
has columns y k as the eigenspectra, W is an matrix whose el- 
ement Wik is the weight of the kth eigenspectrum for the z'th 
mean-subtracted spectrum. The algorithm is described in de- 
tail in Appendix B. 

The eigenspectra are arranged in the order of decreasing 
eigenvalues X\, which are also used to calculate the contribu- 
tion of each eigenspectrum to the total variation of the input 
spectra as P k = X 2 k /J2irf (Mittaz et al. 1990). Thus, the first 
nth eigenspectra can account for YTk=i ^ percentage of the 
total variation. This quantity is used to determine how many 
eigenspectra are sufficient for understanding the input spec- 
tra, by comparing it to the intrinsic variation in percentage P mt 
(Francis et al. 1992). The quantities above are widely used in 
previous SPCA studies, but only have meaning as an average 
over the entire wavelength range. In this work, we develop 
them into more sophisticated forms that apply to each wave- 
length bin, which are helpful in interpreting the eigenspectra 
in a physical way. 

We calculate the intrinsic variation and residual variation in 
each wavelength bin as Francis et al. (1992) did, but do not in- 
tegrate them over wavelength. In an arbitrary jth wavelength 
bin, we define pj l as the proportion of the intrinsic variation, 
and pjk represents the proportion of the variation accounted 
for by the fcth eigenspectrum in this particular wavelength bin. 
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Figure 1. Cumulative proportion of variation accounted for by eigenspectra 
of the first nth order. The horizontal dashed line marks the intrinsic varia- 
tion in percentage, and the cross marks where the cumulative proportion of 
variation exceeds the intrinsic variation. 

Thus, we obtain a matrix P, which has the same shape as V. 
Each column (p,t) of P represents the contribution of the cor- 
responding eigenspectrum and has the form of a spectrum. 
We call it the fractional-contribution spectrum. Using the 
fractional-contribution spectrum, it is straightforward to un- 
derstand which spectral feature is dominated by which eigen- 
spectrum. The details of our definitions and the comparison 
to previously used quantities are presented in Appendix C. 

4. RESULTS 

In this section, we will first describe and interpret the de- 
rived eigenspectra phenomenologically, and then we will in- 
vestigate correlations between the weights of the eigenspectra 
and the actual measurements derived from the original spec- 
tra, to confirm the interpretation. Bootstrap is performed to 
study the stability and uncertainty of the eigenspectra. Lastly, 
we will fit eigenspectrum 3 to demonstrate that it consists of 
a single, intermediate-width emission-line component. 

4. 1 . Eigenspectra: Description 

Figure 1 shows the cumulative proportion of variation ac- 
counted for by eigenspectra added successively. The intrinsic 
variation, in percentage, is P mt = 87.62% (horizontal dashed 
line)- it can be accounted for by the first 28 eigenspectra 
(J2% p j = 87.64%). This confirms that it is sufficient to con- 
sidering only the first 30 eigenspectra (Appendix B). 

The upper panel of Figure 2(a) shows the mean spectrum of 
the normalized spectra. As it resembles the composite quasar 
spectrum of Vanden Berk et al. (2001), it implies that the sam- 
ple for SPCA used here is not strongly biased. The lower 
panel shows, the proportion of the intrinsic variation in 
each wavelength bin. It drops to around 5100 A because 
we normalized the spectrum to unity over 5075-5125 A. pj 1 
rises toward shorter wavelengths because of the blue power- 
law continuum of quasars. As expected, p™ 1 also shows that 
the most variable emission features are the Balmer lines, [O 
III], and Fe II. 



Figures 2(b)-(f) show the first five eigenspectra (upper 
panels) and the corresponding fractional-contribution spectra 
(lower panels). The number in the upper panel indicates the 
proportion of the variation accounted for by each eigenspec- 
trum, as defined in Equation (C3). Apart from the first five 
shown here, higher order eigenspectra each contribute only 
a tiny proportion of the variation (Pk < 1% for k > 6). The 
first three eigenspectra are the most prominent, not only be- 
cause they contribute, respectively, 61.5%, 12.0%, and 4.9% 
of the variation in total, but also because they account for 
more than —50% of the variation in many wavelength bins. 
Moreover, the three eigenspectra resemble realistic compo- 
nents in quasar spectra. We will describe them phenomeno- 
logically below. 

Eigenspectrum 1 clearly consists of two components: a 
power-law continuum and Balmer emission lines. The Balmer 
continuum is also present, because the blue end of the eigen- 
spectrum is steeper than a simple power law extrapolated from 
the red end. The Balmer emission lines in this eigenspectrum 
are broader than those in the mean spectrum. The fractional 
contribution decreases from — 100% at the blue end to — 0% 
at the red end; considering our adopted normalization, this in- 
dicates that this eigenspectrum represents the change of the 
slope of the power-law continuum. In detail, the fractional- 
contribution spectrum dips near the center of the Balmer lines, 
indicating eigenspectrum 1 contributes only to the line wings. 

Eigenspectrum 2 and its fractional-contribution spectrum 
clearly reveals a series of narrow emission-line features, in- 
cluding [O II] A3727, [Ne m] A3869, [O in] A4363, He n 
A4686, [O III] AA4959, 5007, and Balmer lines. Note that 
narrow He II is weak in the eigenspectrum but prominent in 
the fractional-contribution spectrum. This suggests that weak 
He II also varies and correlates with other narrow emission 
lines. Eigenspectrum 2 accounts for almost 100% at the cen- 
ter of [O III] AA4959, 5007, meaning that it causes almost all 
the variation in narrow emission lines. 

Eigenspectrum 3 accounts for only 4.9% of the total varia- 
tion but contributes significantly in several narrow wavelength 
bands. Figure 2(d) exhibits clean Fe II emission, as well as 
Balmer lines that are narrower than those in eigenspectrum 
1 but broader than those in eigenspectrum 2. The fractional- 
contribution spectrum in the lower panel indicates that this 
eigenspectrum mainly contributes Fe II emission and the core 
of the Balmer lines. Note that although the eigenspectrum 
shows negative narrow features at the wavelengths of [O II], 
[Ne III], and [O III], it accounts for almost no variation in 
these wavelength bins. Thus, these negative features are nei- 
ther absorption lines nor inverse correlations between the nar- 
row emission lines and Fe II; they arise only from cross talk 
(Shang et al. 2003). Otherwise, as demonstrated by the simu- 
lations in §5.1, the fractional contribution will be large at the 
wavelengths of these lines. 

Eigenspectra 4 and 5 do not resemble realistic spectra, 
but show "W"-shaped features that represent variations in 
emission-line widths (Mittaz et al. 1990). They contributes 
variation in the same wavelength bands where they show the 
"W" shapes. Eigenspectrum 4 mainly contributes to the varia- 
tion of H/3 wings and Fe II, indicating the correlation between 
H/3 width and Fe II strength. Eigenspectrum 5 contributes 
-40% variation at the blue wings of [O III] AA4959, 5007, 
suggesting that it represents the blueshifted wings of the [O 
III] lines. It also accounts for a fraction of the Fe II variation, 
indicating a correlation between Fe II emission and the [O III] 
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Figure 2. (a) Mean spectrum (upper panel) and proportion of the intrinsic variation (lower panel), (b)-(f) First five eigenspectra (upper panels) and the corre- 
sponding fractional-contribution spectra (lower panels). The numbers in the upper panels indicate the contribution of the eigenspectra to the total variation over 
the entire wavelength range (defined in Equation [C3]), which is used traditionally. 
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Figure 3. Normalized cumulative fractional-contribution spectra for the first 
five eigenspectra, increasing monotonically from top to bottom. The horizon- 
tal dashed line in each panel marks 100%. 

blue wing. 

Figure 3 shows cumulative fractional-contribution spec- 
tra normalized to the proportion of the intrinsic variation, 
YTk=\Pjkl 'p'f' f° r tne nrst nve eigenspectra. pj 1 has been 
smoothed with a 3 -pixel boxcar filter, to avoid abrupt spikes 
around 5100 A in the normalized cumulative fractional- 
contribution spectra. They reinforce our interpretation above. 
Adding eigenspectrum 2 increases the contribution to narrow 
emission lines, especially [O III], from a few to almost 100 
percent. J2k=i Pjk' compared to J2k=i Pjk< snows significant 
Fe II and intermediate-width Balmer emission lines. Adding 
eigenspectrum 4 and 5 slightly changes the wings of H/3 and 
[O III], respectively. The normalized Y^k=\Pjk already ap- 
proximate 100% (the horizontal dashed line) in many wave- 
length bins when n = 5, and reach almost unity in the entire 
wavelength range when the first 28 eigenspectra were added 
(see also Figure 1). 

In summary, from a phenomenological examination of the 
eigenspectra and the corresponding fractional-contribution 
spectra, the first three eigenspectra represent the power-law 
continuum + very broad emission lines, narrow emission 
lines, and intermediate-width emission lines, respectively. In 
the next subsection, we will investigate the correlations be- 
tween the weights of the eigenspectra and some measured 
variables, to confirm the interpretation of the eigenspectra 
above. 

4.2. Eigenspectra: Correlations 

The weights of the eigenspectra used in the reconstruction 
(W in Equation [B3]) have two usages: (1) they can reveal 
subpopulations according to their distributions and then be 
used for classification (e.g., Francis et al. 1992; Ludwig et al. 
2009; Boroson & Lauer 2010); (2) they can be used to seek 



correlations with other measured quantities to elucidate their 
physical meaning. We follow the latter approach in this paper. 

Among the large number of quantities measured by Hu et 
al. (2008b) for their quasar sample, we explore the power- 
law spectral index of the continuum {a, defined as f\ oc A"), 
EW of broad H^ [EW(H/3 BC )], EW of [O III] A5007 [EW([0 
III])], and EW of the Fe II emission between 4434 A and 4686 
A [EW(Fe II)]. We choose these three emission lines because 
they are the typical lines with broad, intermediate, and narrow 
line widths in the optical band (Hu et al. 2008b). For simplic- 
ity, all EW measurements refer to the continuum at 5100 A 
(e.g., Boroson & Green 1992); thus, they are not exactly the 
luminosity ratios of the emission lines to the ionizing contin- 
uum (see §7.2 below for more discussion). EW(H/?bc) is de- 
rived from their Gauss-Hermite fitting for the broad H/3 com- 
ponent, and EW([0 III]) comes from the sum of two Gaussian 
components if a second Gaussian is needed. For the sources 
in our sample, the broad H/3 component and [O III] are fitted 
very well by a Gauss-Hermite and double-Gaussian functions, 
respectively; EW(H/3bc) and EW([0 III]) used here refer to 
the EWs for the entire emission line and depend little on the 
model used for the profile fitting. 

Figure 4 shows the weights of the first three eigenspectra 
plotted against the measurements mentioned above. The cor- 
relation between each pair of quantities is tested by calcu- 
lating the Pearson's correlation coefficient rp, which is given 
in each panel. The weight of eigenspectrum 1 strongly cor- 
relates with a and EW(H/3 BC ), with r P = -0.863 and 0.579, 
respectively, supporting the proposition that eigenspectrum 1 
represents both the continuum and the Balmer emission lines. 
For the weight of eigenspectrum 2, the strongest correlation 
is with EW([0 III]), yielding r P = 0.875, while eigenspectrum 
3 correlates best with EW(Fe II), with r P = 0.418. These re- 
sults indicate that eigenspectrum 2 represents narrow emis- 
sion lines and eigenspectrum 3 corresponds to Fe II emission. 
At the same time, that each of the four measurements strongly 
correlates with only one of the weights demonstrates the va- 
lidity of using these eigenspectra to decompose quasar spec- 
tra. The four strongest correlations are fitted by the OLS bi- 
sector method (Isobe et al. 1990), and the best-fitting fits are 
shown as solid lines in the corresponding panels. The four 
fitted lines match the trends of the data points well. The only 
exception is the correlation between w,3 and EW(Fe II) (top- 
right panel), which is somewhat flatter. This deviation is in- 
duced by the bias in our sample. Our sample selection ex- 
cludes objects with EW(Fe II) < 25 A; this cutoff imposes an 
artificial sharp angle, which flattens the trend. 

The results of the correlation analysis are consistent 
with the phenomenological analysis on the eigenspectra and 
fractional-contribution spectra presented in the last subsec- 
tion. 

4.3. Bootstrap: Stability and Uncertainty of the 
Eigenspectra 

Outliers with rare features in their spectra (e.g., those with 
low-ionization broad absorption lines, which may exist in 
our sample) affect the results of SPCA. The instability thus 
induced in the eigenspectra can be tested using a bootstrap 
method (e.g., Wang et al. 201 1), which is adapted here. 

In each bootstrap realization, a resample of 816 spectra is 
obtained by random sampling with replacement from the orig- 
inal sample, such that some spectra might be absent while 
some might be present more than once. Then, the same steps 
described in §3 and Appendix B are performed to derive the 
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Figure 4. Weights of the first three eigenspectra versus spectral index a, EWs of the broad component of H/3, Fe II, and [O III]. The number in each panel is 
the Pearson's correlation coefficient rp for the two quantities plotted. For cases with rp > 0.4, the solid line shows the fit using OLS bisector method (Isobe et al. 
1990). See text for the deviation between the fit and the data points in the top-right panel. 
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Figure 6. Fitting of eigenspectrum 3. The top panel shows the mean eigen- 
spectrum 3 (green) and the model (red), which consists of three components 
(blue), including a Legendre polynomial "continuum," a set of Gaussians to 
represent Fe II and Balmer lines, and a set of Gaussians with negative inten- 
sity to represent the [O III] lines that arise from cross talk. The bottom panel 
shows the residuals. 



eigenspectra. We repeat the bootstrap 100 times and obtained 
100 series of eigenspectra. Figure 5 shows the one standard 
deviation of the first five eigenspectra. The eigenspectra are 
quite stable, demonstrating that our sample selection is suit- 
able for SPCA. Among the five, eigenspectra 2 and 3 have 
the smallest uncertainty, which is understandable if the two 
each represent a single emission-line component (narrow and 
intermediate-width emission lines, respectively), as we sug- 
gest. 

The standard deviations of the eigenspectra given by the 
bootstrap method also serve as an error estimate, which is 
helpful for the fitting below. 

4.4. Fitting the Eigenspectra 

As shown in Figure 2, for either eigenspectrum 1 or 2, it is 
easy to identify the emission lines and conclude that within 
each eigenspectrum the lines have roughly the same profile. 
For eigenspectrum 3, the Balmer lines are clear and obvi- 
ously narrower than those in eigenspectrum 1 . But while the 
blended Fe II lines are significant, the profile of each single 
Fe II line cannot be resolved without fitting. It is not obvi- 
ous whether all the Fe II lines have the same profile, and if 
so, whether they have the same profile as the Balmer lines 
in eigenspectrum 3. Thus, we fit the mean eigenspectrum 
3 given by the bootstrap method in §4.3, using the standard 
deviation derived by bootstrap as the error, to explore these 
questions. 

As shown in Figure 6, the mean eigenspectrum 3 can be 
fitted well with the following three components. (1) A Legen- 
dre polynomial to model the "continuum." (2) A set of single 
Gaussians with the same velocity shift and width, the intensi- 
ties of which are free to vary. Each Gaussian is an emission 
line, including Balmer lines from H/3 to H77, and Fe II lines. 
We assemble the line list for Fe II from Veron-Cetty et al. 
(2004) and Sigut & Pradhan (2003). (3) Another set of lines, 
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each of which is modeled by a single Gaussian with negative 
intensity and fixed width and shift, to represent the cross-talk 
features at the location of [O III] AA4959, 5007 and [Ne III] 
A3869. 

Excluding the cross talk component (the third component 
above), the emission lines in eigenspectrum 3 can be fitted 
well using a single Gaussian with identical width and shift. 
This finding strongly supports our proposition, based on the 
phenomenological analysis in §4.1, that eigenspectrum 3 rep- 
resents an independent, physically distinct emission compo- 
nent characterized by intermediate velocities. 

Our fits yield a list of emission-line intensity ratios, one 
that, as in Veron-Cetty et al. (2004) and Dong et al. (2011), 
can be easily used as a template to model real quasar spectra. 
If only the Fe II lines in the list are used, it works as an op- 
tical Fe II template. This application will be discussed in the 
second paper of this series. A more interesting application is 
to combine both the Fe II and the Balmer lines into a single 
template for the intermediate-line region. Such an approach 
would assume that the H/3 to Fe II intensity ratio is roughly 
constant in the intermediate-line region for different sources. 
This is possible if the clouds that emit intermediate-velocity 
lines have large column densities, as suggested by Ferland et 
al. (2009), who predict that the outward emission reaches an 
asymptotic value of Fe ll/H/3 « 3 (see their Figure 3). If this 
assumption is tenable, the Fe II emission can be used to con- 
strain the intermediate-width component of H/3, thereby en- 
abling a more physically motivated strategy for decomposing 
the U/3 profile. This is relevant for black hole mass estimation 
and will be explored in the third paper of this series. 

5. SIMULATIONS 

Simulations with artificial spectra are used in the literature 
to test the validity and limitations of using the SPCA method 
for physical interpretation (e.g., Mittaz et al. 1990; Brotherton 
et al. 1994; Shang et al. 2003). While SPCA is successful in 
recognizing independent sets of correlated emission features, 
it is widely acknowledged that the interpretation is severely 
complicated by cross talk (e.g., the negative narrow features 
in our eigenspectrum 3; Figure 2(d)). More seriously, dif- 
ferent eigenspectra, such as SPC1 and SPC3 in Shang et al. 
(2003) or the intermediate-line region component in Broth- 
erton et al. (1994), cannot be uniquely attributed to distinct 
physical components. While these eigenspectra can be as- 
cribed to independently varying components, as we do in our 
study, this interpretation is not unique. They can also be pro- 
duced by a single emission-line component whose width be- 
comes broader with lower peak flux (see Figure 13 of Shang 
et al. 2003). 

In §3 above, we introduce the fractional-contribution spec- 
tra matrix P (calculated by Equation [C7] in Appendix C), 
which, as shown in §4.1, facilitates recognition of emission 
features in the eigenspectra. In this section we add P to simu- 
lations to investigate if it can be helpful in (1) distinguishing 
cross talk and (2) constraining the single emission-line region 
model. 

In order to mimic real SDSS data, we generate artificial 
spectra using the same method described in §3.3 of Hu et 
al. (2008b). We add Gaussian random noise using a realis- 
tic noise pattern that is generated by scaling a real error array 
taken from SDSS observations to match the desired S/N of 
the simulation, and we adopt a realistic mask array. To match 
our SPCA sample, the redshift and S/N of the spectra are 
randomly set to be uniformly distributed as z ~ U(0.1,0.67) 



and S/N ~ U(10,20). Fe II emission is modeled by Gaussian 
broadening, scaling, and shifting the template constructed by 
Boroson & Green (1992). All the other emission-line compo- 
nents have a Gaussian profile. The distributions of EWs are 
chosen to be consistent with those in the real SDSS sample. In 
each case below, 1000 spectra are generated for SPCA. Table 
3 summarizes the models and parameters of each simulation 
below. 

5.1. Cross Talk 

The aim of the first series of simulations is to study the be- 
havior of the fractional-contribution spectrum, to see whether 
it can help distinguish between cross talk from a genuine in- 
verse correlation when absorption-like features such as those 
in eigenspectrum 3 are detected. We begin with the simplest 
case, a spectrum consisting of a continuum set to unity and 
only two emission-line components, Fe II and [O III]. The 
FWHMs of the lines are fixed to 1500 km s" 1 and 400 km s" 1 , 
respectively, and no shifts are considered. Only the EWs vary: 
EW(Fe II) - U(25,75) A, and according to how EW([0 III]) 
is generated, three different simulations are performed (Table 
3, case 1). 

(a) The EW of [O III] is independent from that of Fe II: 

EW([Om]) = EWi ~U(0,20)A. (1) 

(b) The EW of [O III] is dependent on that of Fe II: 

EW([0 III]) = EW 2 = 20-0.2EW(Fe II). (2) 

(c) The EW of [O III] is semi-dependent on that of Fe II: 

EW([Offl]) = (EW 1 + EW 2 )/2. (3) 

Figure 7 shows the results of the three simulations. The 
three eigenspectra all show positive Fe II emission and neg- 
ative [O III] lines. Although the strengths of the negative 
[O III] features relative to Fe II are different in the three 
cases, nothing essential can be distinguished. But the three 
fractional-contribution spectra show remarkable differences 
at the positions of [O III] lines: zero in case (a), close to 
100% in case (b), and ~ 40% in case (c). Thus the fractional- 
contribution spectrum, which describes the contribution of 
eigenspectra to emission features, effectively distinguishes 
between cross talk and real correlation. 

Eigenspectrum 3 of our sample (Figure 2(d)) resembles 
case (a) here. The fractional-contribution spectrum goes al- 
most to zero for [O III], meaning that the negative [O III] fea- 
tures arise only from cross talk and do not reflect any intrinsic 
inverse correlation between Fe II and [O III]. But where is 
BG92's EV1 in our eigenspectra? We believe that this signa- 
ture is absent from our sample because of selection effects. 
By design, our sample is biased toward objects with strong Fe 
II (§2.2), and hence we do not expect to find a strong inverse 
correlation between the strengths of Fe II and [O III] . Figure 
8 confirms this expectation; EW(Fe II) and EW([0 III]) are 
only correlated at the level of rp = -0.175. This correlation is 
much weaker than those between Balmer lines, narrow emis- 
sion lines, or Fe II lines, thus will not appear in the first three 
eigenspectra. 

5.2. The Single Hfi Component Model 

Before describing the second simulation, which is aimed at 
ruling out the single Balmer line component model, we plot 
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Table 3 

Models and parameters of the simulations 



Case 


Power law a 


Fell b 


[O III] b 






H/3 B c b 


Results 
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FWHM 


EW 


1(a) 
1(b) 
1(c) 
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U(25,75) 
U(25,75) (U(0 


U(0,20) 
20-0.2EW(Fe II) 
20) + 2()-0.2EW(Fe II))/2 








Fig. 7(a) 
Fig. 7(b) 
Fig. 7(c) 
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U(-2.5,-0.5) 


U(25,75) 




U(20, 100) 


10' «EW(H/3)/EW(Fe II) 




Fig. 10 
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U(25,75) 


U(0,20) 


0.5EW(Fe II) 


FWHM(Fe II) 


-40a 


Fig. 11 



Note. — ■ • • means not included in the specific model. 
a /5100 is fixed to unity. 

b The FWHMs of Fe II, [O III], and H/3| c are 1500, 400, and 4500 km s _1 , respectively. 
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Figure 7. Results of the simulations for cross talk. Panels (a), (b), and (c) show the resultant eigenspectra when the EW of [O III] depends differently on the EW 
of Fe II, as indicated by the labels above the panels. Note the different fractional contribution (p) at the wavelength of [O III]. 



the correlation diagram for FWHM(H/3 B c) versus R Fe , defined 
as the ratio between the EW of Fe II and the EW of H/3 B c 
(Figure 9). There is a rather strong inverse correlation (rp = 
-0.507), which, from an OLS bisector fit, can be described by 



FWHM(H/3 BC ) : 



jq(3.35±0.01) Rp (-1.05±0.02) 



(4) 



This correlation has been found before (e.g., Boroson & 
Green 1992; Sulentic et al. 2000a, and references therein), 
but its physical interpretation is still far from conclusive. 
There are two possible phenomenological interpretations. 
The straightforward explanation is that there is only a sin- 
gle H/3 component whose width varies systematically with 
the relative strength of Fe II and H/3. Alternatively, H/3 has 
two kinematic components, and the intensity of the narrower 
one scales with that of Fe II by some fixed ratio, while the 
broader one varies independently. In this scenario, larger R Fe 
means that H/3 has a stronger narrow component relative to its 
broader component, making the whole H/3 profile narrower. 
Previous SPCA studies have claimed that eigenspectra such 
as our eigenspectra 1 and 3 can be produced by either of the 
models above. There was no definitive way to discriminate 
between the two. 



Here we aim to use our SPCA results to distinguish be- 
tween these two phenomenological models, with the help of 
the fractional-contribution spectra P. We generate artificial 
spectra that consist of three emission components (Table 3, 
case 2): 

(1) Fe II emission whose EW ~ U(25,75) A; 

(2) an H/3 component whose EW ~ U(20, 100) A and 
FWHM = 10 335 R Fe -\ according to Equation (4); 

(3) a power-law continuum whose /5100 flux density is fixed 
to unity but slope varies in the range a ~ U(— 2.5,— 0.5). 

A varying power law is added here to mimic real spectra more 
realistically than in the first series of simulations in the last 
subsection, and it helps us achieve our goal, as shown below. 

Figure 10 shows the mean spectra and the first three eigen- 
spectra given by SPCA. As in Figure 2, the lower panels 
show the proportion of intrinsic variation and the fractional- 
contribution spectra. Considering that we do not account for 
the [O III] lines in the simulation, the mean spectra and pro- 
portion of intrinsic variation here resembles that in Figure 2. 
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Figure 8. EW of Fe II versus EW of [0 III] for our sample used for SPCA 
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is the fit to the data as given in Equation (4). 



There are three main differences between these SPCA results 
based on simulated spectra compared to those derived from 
real spectra. First, the eigenspectrum 1 here shows only a 
clean power law; some emission lines that correlate with the 
continuum are missing. Second, eigenspectrum 2 highlights 
the wings of the Balmer lines. And third, eigenspectrum 3, 
which contains Fe II emission, shows a "W" shape at H/3, and, 
as shown by the fractional-contribution spectrum, contributes 
a lot to the variation of H/3 wings. 

In conclusion, a single Balmer emission-line component 
whose width simply correlates with its strength with respec- 
tive to Fe II generates simulated eigenspectra that do not 



match well with our observed results in any of the first three 
eigenspectra and fractional-contribution spectra. 

5.3. Reproduction of Our SPCA Results 

The simulation above suggests that neither a single Balmer 
emission-line component nor two independently varying 
components can produce our SPCA results shown in Figure 
2. A Balmer line component that correlates with the power- 
law continuum plus another that correlates with Fe II emission 
are needed. We demonstrate this using the simulation below. 

As listed on the last line of Table 3, the artificial spectra are 
composed of the following five components: 

(1) a power-law continuum with specific flux density /5100 
fixed to unity but slope a ~ U(-2.5,-0.5); 



(2) Fe II emission with EW 
FWHM = 1500 km s" 1 ; 



U(25,75) A and a fixed 



(3) [O III] emission with EW ~ U(0,20) A and a fixed 
FWHM = 400 km s" 1 ; 

(4) a Balmer component with EW set to 0.5EW(Fe II) and 
FWHM fixed to that of Fe II; 



(5) a broader Balmer component with EW 
fixed FWHM = 4500 km s" 1 . 



-40a and a 



The fourth component depends on the second, and the fifth is 
determined by the first. Thus, there are only three free param- 
eters: the slope of the power-law continuum, the strength of 
the Fe II emission, and the strength of the narrow [O III] lines. 

Figure 1 1 shows the derived SPCA results. The first three 
eigenspectra, added up, account for 82.40% of the total vari- 
ation, and are sufficient for describing the intrinsic variation 
(82.26% of the total). This is consistent with the number of 
free parameters included in the simulation. Comparing the re- 
sults here with those in Figure 2, this simulation successfully 
reproduces the SPCA results for the real SDSS spectra, in- 
cluding almost all of the main features in the mean spectrum, 
the first three eigenspectra, and the fractional-contribution 
spectra. 

These simulations strongly confirm the validity of the 
fractional-contribution spectrum and verify our interpretation 
of the eigenspectra in §4. They support the notion that two 
Balmer emission-line components are needed to reproduce 
our SPCA results: (1) a broader one that correlates with the 
power-law continuum and produces no Fe II emission and (2) 
a narrower one that has the profile of, and correlates with, Fe 
II. 

6. RESULTS FOR LARGE Fe II VELOCITY SHIFTS 

Aside from the primary sample of quasars with small Fe 
II velocity shift, four other samples with larger Fe II velocity 
shifts are also established in §2.2. This section presents the 
results of SPCA for these four samples, which can be under- 
stood easily in light of the analysis and simulations for the 
primary sample above. 

Figure 12 shows the first three eigenspectra and correspond- 
ing fractional-contribution spectra for the four new velocity 
bins. Sample B (with Fe II velocity shifts between 250 and 
750 km s ) has roughly equal size (794 objects) as the pri- 
mary sample. The results of SPCA for this sample resemble 
those for the primary sample, for example, in terms of the 



Two-Component H/3. I. SPCA 



13 





■ i i i i | i i i i | i i i 
Mean spectrum 


i | i i i i ■ 
(a)! 


c 

a 

§ m 




- 

— 


T— 1 

o 




1 1 1 1 1 J 


o 

T— 1 


_ i i i i | i i i i | i i i 




fc? o 

-t-> 






a . 


: Intrinsic variation (? 

- i i . 


T : 



n — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — i — r 

Eigenspectrum 1 
70.5% 



m 
6 



o . 




o 



o - 



o 
o 



03 



o 
m 



1 1 1 1 1 1 1 1 1 1 1 1 1 T" 



Eigenspectrum 2 
5.9% 




(c)- 



i i i i I i i i i I i i i i I i i i i 



CO 



N o 

in 

CO 



— 1 1 1 1 1 1 1 1 

Eigenspect 

- 3.9% 


■ 1 1 1 1 

rum 3 

w 


1 1 1 1 1 1 — 

(d): 
i i i i i ~ 


' i i i i | i i i 

— ^ -IL. , 



1 1 1 1 1 

J\J\ 

, 1 , , , 


< 



3500 4000 



4500 5000 5500 3500 4000 4500 5000 5500 
Rest Wavelength (A) 



Figure 10. Results of the SPCA simulation for a single-component H/3 line whose width correlates with /?Fe- The labels are the same as those in Figure 2. While 
the mean spectrum resembles that of Figure 2, the first three eigenspectra and their corresponding fractional-contribution spectra differ. Note that the simulation 
here does not take into account the [O III] lines. See text for details. 



contributions of the eigenspectra over the entire wavelength 
range (59.9%, 15.3%, and 4.0%), the shapes and intensities of 
the first three eigenspectra, and fractional-contribution spec- 
tra. Thus, the first three eigenspectra represent the same emis- 
sion components described in §4. 1 . 

In Figure 13, we compare the first three eigenspectra of 
sample B (in red, vertically shifted) to those of the primary 
sample (in black). The rest wavelengths of major emission 
lines are marked by dotted lines. Top and middle panels show 
that eigenspectra 1 and 2 of the two samples are almost the 
same. None of the major emission lines (very broad Balmer 
lines in eigenspectrum 1 ; narrow Balmer lines, He II, and [O 
III] lines in eigenspectrum 2) have a significant velocity shift 
between the two samples. The only notable difference is seen 



in eigenspectrum 3. The Fe II lines and the intermediate-width 
Balmer lines of sample B are redshifted by ~ 400 km s" 1 
with respect to the primary sample. This is expected because 
the two samples are divided by Fe II velocity shift and the 
third eigenspectrum represents the intermediate-width emis- 
sion lines, which mainly include Fe II. It provides additional 
evidence that the measurement of Fe II velocity shift in Hu 
et al. (2008b) is reliable in these two samples. Note that the 
negative features of [O III] arise from cross talk, not redshift. 

The three bins with larger Fe II velocity shift contain rela- 
tively fewer objects: 338, 203, and 110 in samples C, D, and 
E, respectively. The eigenspectra and fractional-contribution 
spectra of these three samples are very similar. Their first 
two eigenspectra and fractional-contribution spectra resem- 
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ble those of the primary sample (judging from the fractional- 
contribution spectra, note that the [O III] lines in eigenspec- 
tra 1 arise just from cross talk), while fractional-contribution 
spectra 3 are different. Eigenspectra 3 of these three sam- 
ples contribute little in terms of Fe II emission. Their con- 
tributions over the entire wavelength range are also minimal. 
Thus, eigenspectra 3 of samples C, D, and E do not record 
the intermediate-width component. This is reasonable be- 
cause EW(Fe II) in the three samples is small and Hf3 is rather 
broad (Hu et al. 2008b) and tends to have single-Gaussian pro- 
files (Figure 1(a) of Hu et al. 2008a). The intermediate-width 
component is expected to be weak, contributes little to the to- 
tal variation, and can be easily smeared by nonlinear factors. 



This interpretation is supported by the fact that the first two 
eigenspectra have already contributed a larger proportion of 
variation (> 80%) than the first three eigenspectra of the pri- 
mary sample (78.4%). 

In conclusion, the results of SPCA for sample B resem- 
ble those for the primary sample, except that eigenspectrum 
3 is redshifted. For the other three samples, only the first two 
eigenspectra are important; they record the power-law contin- 
uum + very broad emission lines and narrow emission lines, 
respectively. 



7. DISCUSSION 



Two-Component H/3. I. SPCA 



15 



Eigenspectra and Fractional — Contribution Spectra 



o 

LO 

o 

LO -v, 



i-l 



m 

o 
o 

> 



(L) 

s 

0) 
0) 



Q 

o 

w 

m 



3 



o 

LO 

C\2 



o 

LO 
!> 



•4! 



o 

LO 



o 



6 * 

o 

LO 

CV! 
N 

O 

£ g 



First 



W 59.9% 




iii 


Mil- 



60.6% 



i i i i | 



63.9% 



i i i i | 



V 65.1% 




iii 


i i i i 







3500 4000 4500 5000 



Second 



15.3% 

794 objects 

Li A A k . jJv 


ii 




i i i | i i i i | i i i 


1 ; 





15.3% 

338 objects 


I, 

i 




i i i i | i i i i | i i i i 

L J ^ ii AJ' 


i ; 






" 20.1% 

203 objects 

1 1. . J. _^_a1 


ii 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


1 1 1 1 | 1 1 1 1 | 1 1 1 1 








- 19.0% 

110 objects 

L L.i ,i u . tl 


II 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


1 J Jit»toj.J..JJ 







Third 


4.0% 
'iii 


Mil' 




Mi 




2.1% | 




ii 


— M 




1.9% 1 




ii 

1 I 1 1 I I I 1 1 1 I 


— II 
Mil 




1.7% 




ii 

. . -■.i,w^.'^ii**M^j'/'.' 


— ii 



3500 4000 4500 5000 



3500 4000 4500 5000 
Rest Wavelength (A) 

Figure 12. The first three eigenspectra and corresponding fractional-contribution spectra for the four bins of large Fe II velocity shift. The range of Fe II velocity 
shift in each bin is labeled on the left of each row. The percentage in each panel indicates the contribution of the eigenspectrum over the entire wavelength range. 
The number of objects in each of the four bins is given in the panels of the middle column. 



16 



Hu et al. 




? o 



1 1 Hi-H 1 1 H h- 

Primary (^-250 to 250 km 
Sample B:(250 to 750 km 




4000 



4500 



5000 



5500 



Rest Wavelength (A) 



Figure 13. Comparison between the first three eigenspectra of the primary 
sample (in black) and sample B (in red). The eigenspectra of sample B are 
shifted vertically for clarity. The dotted lines mark the rest wavelengths of 
major emission lines encoded in the eigenspectra (top panel: Balmer lines; 
middle panel: Balmer lines, He II A4686, [O III] A 4363, and [O III] AA4959, 
5007; bottom panel: Balmer lines, and Fe II lines). Note the redshift of 
eigenspectrum 3 of sample B. 



Our SPCA analysis is based on the quasar sample of Hu 
et al. (2008b), which is intentionally biased toward objects 
with strong Fe II emission [EW(Fe II) > 25 A] to facilitate 
measurement of Fe II velocity shifts. Given this situation, it 
is important to ask whether the principal result of this paper, 
namely that the Balmer lines contain two physically distinct 
components, is unique to our sample or applies to the quasar 
population in general. The key test of this scenario is to per- 
form a detailed decomposition of the broad H/3 profile, to 
isolate the two kinematic components. We defer this analy- 
sis to the second paper of this series, where we will explore 
the feasibility of using the eigenspectrum 3 derived here as a 
template to model the intermediate-velocity component of the 
spectrum. 

In the meantime, we will use two samples selected differ- 
ently from that used in our SPCA analysis to argue that our 
main conclusions are robust against sample selection effects. 

7.1. Intermediate-line Region 

Our analysis posits that the traditional BLR consists of two 
components, one with intermediate velocities in which H/3 
and Fe II coexist with roughly constant relative intensity, and 
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Figure 14. EW of H/?bc versus that of Fe II. This sample is selected from the 
SDSS DR5 quasar catalog by the same criteria of those in Hu et al. (2008b), 
except that no cut is made for EW(Fe II). The solid line denotes EW(H/3bc) 
= 0.5 EW(Fe II). Fewer than 1% of the sources lie below this line. 

another characterized by larger velocities that emits H/3 but 
no Fe II. The relative contribution of these two components 
varies from source to source. A simple consequence of this 
picture is that the EW of the total broad H/3 line should be 
larger than a factor times that of Fe II. We construct a sample 
from the SDSS DR5 quasar catalog using the same criteria of 
Hu et al. (2008b), as described in §2.2, except that we impose 
no restriction based on EW(Fe II). Thus, this sample is not 
biased toward objects with strong Fe II emission. 

Figure 14 shows the relation between H/?bc and Fe II EWs 
for this sample. As in Figure 4, we measure H/?bc by Gauss- 
Hermite fitting, which accounts for the full broad H/3 pro- 
file. The distribution of points in the plot shows a clear lower 
boundary, which is approximately delineated by EW(H/3bc) 
= 0.5 EW(Fe II). Only 47 sources out of 4757 (fewer than 
1 %) lie below the line. The existence of this bottom boundary 
is consistent with our two-component scenario if the strength 
of the intermediate-width H/3 component is roughly a fixed 
fraction of Fe II. Furthermore, our simulations confirm that 
such an assumption reproduces well the eigenspectra of the 
real data. 

7.2. Correlation Between Strength of Very Broad H/3 and a 

A correlation between the EW of broad H/3 and the opti- 
cal continuum slope a has seldom been reported in the liter- 
ature, and the few papers that have mentioned it have given 
inconsistent results. Srianand & Kembhavi (1997), for ex- 
ample, found no correlation between optical spectral index 
and the EW of H/3, but Francis et al. (2001) reported that the 
EW of H/3 increases as the continuum becomes bluer. Our 
finding that the EW of very broad H/3 correlates with a does 
not necessarily contradict with previous studies because no 
H/3 decomposition was done before, not to mention that the 
samples are different. Richards et al. (2003) found that H/3 
line width decreases in quasars with redder continuum; this 
trend is qualitatively consistent with our scenario insofar as 
bluer objects exhibit a stronger very broad H/3 component and 
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Figure 15. Correlation diagram for the EW of entire broad H/3 line versus 
the slope of the power-law continuum (a, defined as f\ oc \ a ) for sources 
with EW(Fe II) < 25 A in Figure 14. The inverse correlation is strong. The 
solid line shows the OLS bisector fit. 

hence a broader overall line profile. 

The sources with EW(Fe II) < 25 A in Figure 14 comprise 
quasars with weak Fe II emission; they formally lie outside 
of the selection criterion of Hu et al. (2008b). Within the 
framework of this study, this subsample of weak-Fe II sources 
should have a weak intermediate-width H/3 component, and 
their broad Hf3 emission should be dominated by the very 
broad component. This subsample is convenient for testing 
the inverse correlation between the strength of the very broad 
H/3 component and the slope of the power-law continuum, 
without having to perform accurate profile decomposition. 

Figure 15 confirms our expectations. We find a strong in- 
verse correlation between EW(H/3bc) and a, with Pearson's 
correlation coefficient rp = -0.463. An OLS bisector fit yields 



EW(H/3 BC ) = (17.5±2.7)-(41.6±1.3)a. 



(5) 



This empirical result motivated one of the input criteria (the 
fifth) for the simulation described in §5.3. 

A bluer continuum has more ionizing photons relative to the 
non-ionizing continuum at 5100 A, to which the EWs in the 
present paper all refer. Thus, an inverse correlation between 
the EW of an emission line and a arises naturally if the line- 
emitting clouds are optically thick to, and photoionized by, 
the same continuum that we see, and the clouds emits isotrop- 
ically. On the other hand, the lack of any correlation between 
the intermediate-width component and the continuum slope 
reflects the complexity of the dynamics (Hu et al. 2008b) and 
physics (e.g., Baldwin et al. 2004; Ferland et al. 2009, and 
reference therein) of the Fe Il-producing gas. These two dif- 
ferent dependences on spectral slope, in conjunction with the 
distinct locations and dynamics implied by their different ve- 
locity widths and shifts, indicate that the two components may 
respond very differently to continuum variations. This has 
been suggested by the reverberation analysis of Fe II emis- 
sion in Ark 120 (Kuehn et al. 2008), and also by recent stud- 
ies that investigate the time delay between different velocity 
components of H/3 and the continuum (Zhang 2011; Wang 



& Li 2011). Velocity-resolved time-delay measurements in 
some sources also show that H/3 originates from structures 
more complicated than a single virialized region (e.g., Bentz 
et al. 2010; Denney et al. 2010, and reference therein). 

The width of the broad H/3 emission line has been widely 
used to estimate virial velocities to derive BH masses in 
AGNs. Our two-component model for the H/3-emitting re- 
gion raises an important question: which component better 
traces the virialized portion of the BLR? Our SPCA results 
suggest that the very broad component is more appropriate 
for BH mass estimation. We will investigate this problem in 
the third paper of this series. 

8. SUMMARY 

We select a sample of 816 quasars with small Fe II veloc- 
ity shift from SDSS and perform spectral principal compo- 
nent analysis (SPCA) on this sample in rest wavelength range 
3500-5500 A. Apart from adjusting some details in the SPCA 
algorithm, we introduce, for the first time, a parameter called 
the fractional-contribution spectrum that measures the pro- 
portion of the variation accounted for by an eigenspectrum in 
a wavelength bin. We demonstrate the utility of this new pa- 
rameter in helping to interpret the eigenspectra. We explore 
the correlations between the weights of the eigenspectra and 
various physical quantities to confirm the physical meaning 
of the eigenspectra. We perform a bootstrap analysis, spec- 
tral fitting of the eigenspectra, and Monte Carlo simulations 
to test the uncertainty of the eigenspectra, the validity of the 
fractional-contribution spectrum we introduced, and the two- 
component model of the H/3-emitting region we propose. 

Our principal findings from SPCA are as follows. 

1 . The first three eigenspectra have clear physical mean- 
ings. Eigenspectrum 1 represents the power-law continuum, 
very broad Balmer emission lines, and the correlation be- 
tween them. Eigenspectrum 2 represents narrow emission 
lines. Eigenspectra 3 consists of Fe II lines and Balmer lines 
with kinematically similar intermediate velocities. 

2. The fractional-contribution spectrum is a powerful tool 
for diagnosing the emission features represented in each 
eigenspectrum, as well as for recognizing spurious features 
that arise from cross talk. 

3. The broad H/3 line consists of two physically distinct 
components: a very broad component whose strength corre- 
lates with the slope of the optical power-law continuum, and 
an intermediate-width component that has the profile of, and 
correlates with, Fe II. 

While our current sample is biased toward objects with 
strong Fe II emission, we argue that our overall findings are 
immune from strong sample selection bias and are generally 
applicable to the general quasar population. This means that 
the strength of the intermediate-width H/3 component varies 
from source to source and could vanish. In extreme cases H/3 
only has a single Gaussian component. We will rigorously 
test this conclusion in a forthcoming work, where we will per- 
form detailed spectral decomposition using the new template 
spectrum for Fe II emission and intermediate-velocity Balmer 
lines derived from eigenspectrum 3. 

The two-component nature of the broad H/3 line raises im- 
portant concerns about the robustness of current virial BH 
mass estimates that make use of the H/3 line width. This issue 
will be explored in another forthcoming publication. 
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APPENDIX 

A. EVIDENCE FOR REDSHIFTED Fe II EMISSION 
IN QUASARS FROM COMPOSITE SPECTRA 

Sulentic et al. (2012) called into question the measurement 
of Fe II velocity shifts (vF e ) in Hu et al. (2008b). They raised 
two criticisms. First, they argue that the majority of the 
sources in Hu et al. (2008b) do not have enough S/N to yield 
a reliable measurement of VF e - Second, they note that Hu et 
al. (2008b) did not include He II emission in their fits. Su- 
lentic et al. (2012) generated composite spectra, which have 
high S/N, of several subsamples defined by their 4D Eigen- 
vector 1 formalism, and then measured vp e from fits of the 
composite spectra that include the He II line. They concluded 
that the Fe II emission in their composite spectra do not have 
significant velocity shifts. In this Appendix, we adopt similar 
fitting methods, take He II into consideration, measure vpe for 
the five composite spectra generated in Hu et al. (2008b), and 
test the statistical significance of our measurement of VF e - We 
confirm that our redshift measurements of Fe II are robust. 

The details of the five composite spectra are described in 
§4.6 of Hu et al. (2008b). Briefly, the composite spectra are 
geometric means (Vanden Berk et al. 2001) of the spectra of 
quasars in five bins of different VF e measured in Hu et al. 
(2008b): -250 to 250 km s" 1 (A, 1350 objects), 250 to 750 
km s" 1 (B, 1362 objects), 750 to 1250 km s" 1 (C, 590 ob- 
jects), 1250 to 1750 km s" 1 (D, 332 objects), and 1750 to 



2250 km s -1 (E, 180 objects). Our fitting method here resem- 
bles that in Hu et al. (2008b) but is improved in two aspects: 
(1) the He II line is included and (2) the power-law contin- 
uum, Fe II emission, and other lines are fitted simultaneously. 
The left column of Figure Al shows our results. For each ve- 
locity shift bin, the top panel shows the continuum-subtracted 
composite spectrum (black) and best-fit model (red). Besides 
the power-law continuum, the model contains the following 
components: (1) Fe II emission (blue) modeled by broaden- 
ing, scaling, and shifting the I Zw 1 Fe template constructed 
by Boroson & Green (1992), (2) broad emission lines (green) 
modeled by a set of Gaussian-Hermite functions representing 
H/3 A4861, H 7 A4340, H<5 A4102, and He n A4686, (3) nar- 
row emission lines (cyan) modeled by a set of single Gaus- 
sians representing H/3, Hy, H<5, [O III] AA4959, 5007, [O III] 
A4363, and He II A4686, and (4) wings of the [O III] lines 
(magenta) modeled by a set of single Gaussians. The lines 
in each set have the same shape and shift, but different inten- 
sities. The bottom panel shows the residuals. The fitting is 
performed in the wavelength window 4000-5600 A. 

The significance of the measured VF e can be tested by using 
a x 2 statistic to determine whether the model with varying v Fe 
fits the data better than the model with v Fe fixed to a specific 
value (with other parameters left free). We use the F-test de- 
scribed in §1 1.4 of Bevington & Robinson (2003). In the fit- 
ting described above, if VF e is free to vary the best fit has chi- 
square xl-ee w i tn 1435 degrees of freedom, and the reduced 
chi-square xthee = Xf re e/1435. If we fix v Fe to a constant but 
allow all other parameters to vary and fit the composite spec- 
tra again, the new best fit has chi-square xLed with 1435+1 
degrees of freedom. From Equation 1 1 .50 of Bevington & 
Robinson (2003), the quantity 

Fx = (Xfixed ~~ XfreeV X^free (Al) 

follows a F\ ,„ 2 distribution, where v 2 = 1435 in this case. 
Thus, the probability that the model with varying vp e does 
not improve the fit compared to the model with v Fe fixed to 
a specific value equals the probability of exceeding F x in 
an F\, V2 distribution, namely Pf(F x ; 1, 1435). From Figure 
C.5 of Bevington & Robinson (2003), the value of F x for 
P F (F X ; 1, 1435) = 1-99.73% is approximately 9 (the v 2 = oo 
curve); the precise number is 9.03 ? Thus, the best-fit value of 
v Fe is significant at more than 3a confidence level if 

(Xfixed -xLVxlfree > 9-03, (A2) 

or, equivalently, 

xLd/xL = F x /dof+ 1 > 1 .0063, (A3) 

where do f is the number of degrees of freedom when v Fe is 
free to vary (1435, in this case). 

We fix v Fe from -200 to 2000 km s" 1 , obtain the best-fit 
chi-square xLed f° r eac ^ fixed input value of v Fe , and build 
up a x 2 curve for each composite spectrum, as shown in the 
right column of Figure A 1 (crosses). For convenience, the chi- 
squares are plotted as (xl xed - xl ee )/xl fiee - Th e equivalent 
values of X nxed /Xf ree are a l so labeled on the right of the plot, 

7 This precise value of F x for P F (F X ; 1, 1435) = 1-99.73% can be obtained 
using the software R (R Core Team 2012) by the command qf (0 . 997 3, 

1, 1435). 

8 Note that the ratio of \ 2 adopted in Sulentic et al. (2012), 1.24, is much 
larger than the value derived here. This led those authors to conclude that the 
best-fit value of 730 km s _1 for the velocity shift of their B 1 subsample is not 
distinguishable from zero shift. It is not clear to us how they derive this large 
value for the \ 2 ratio. 
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Figure Al. (Left) Fits of composite spectra made by assuming vp c is free. The composite spectra were generated in Hu et al. (2008b) by stacking the spectra of 
quasars in bins of different measured i>p c . The range in each bin is labeled on the left of each row. For each bin, the top panel shows the continuum-subtracted 
composite spectrum (black) and best-fit model (red), including Fe II emission (blue), broad emission lines (green), narrow emission lines (cyan), and wings of [O 
III] lines (magenta). The bottom panel shows the residuals. (Right) x 2 curve (crosses) for the fits with fixed VF e ! the insert zooms in around the minimum. The 
horizontal dashed line marks where the difference in x 2 has 3cr significance. The value of the best-fit VF e (solid square) and its 3<r confidence interval (where the 



X 



2 curve intersects the dashed line) are labeled in the panel showing the fit. Note that the range of the abscissa of the x 2 curves for the last two composite spectra 



are different with those for the first three. 
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for comparison with Figure 1 of Sulentic et al. (2012). The 
solid square marks the best-fit result when v Fe is free to vary. 
The insert zooms in around the minimum of the x 2 curve. The 
3a confidence interval of the v Fe measurement can be deter- 
mined by the two points when (xLed _ xLe)/x^ fre e reaches 
9.03 (marked by the horizontal dashed line). Fixing v Fe to 
values outside of this velocity interval gives worse x 2 than 
setting v Fe free, at 3a confidence level. The best-fit value of 
v Fe when it is free to vary and its lower and upper 3 a bounds 
are labeled in the panel on the left column. 

For all five composite spectra the xl xe d when v Fe is fixed 
to zero are significantly larger than Xf re e when v" Fe is allowed 
to vary. Thus, the Fe II emission in the composite spectra do 
exhibit significant velocity shift. The resultant values of v Fe 
(and its 3a confidence level) for the five composite spectra are 
210!*, 550!!$, 890!|°, HHtgg, and 1480!$] km s" 1 . The 
measured v Fe of the first three composite spectra are consis- 
tent with the velocity shift range of the bins, while those of 
the last two are slightly lower. This discrepancy is probably 
caused by the enhancement of the un-shifted spectral features 
(e.g., narrow emission lines and host galaxy component) dur- 
ing the stacking. Figure 12 of Hu et al. (2008b) shows that 
Ca II absorption lines, produced by host galaxy starlight, are 
prominent in composites D and E. This shortcoming of the 
composite spectra has already been discussed in §4.6 of Hu 
et al. (2008b); it also explains why the velocity shifts of Fe II 
cannot be seen by direct visual inspection. 

He II emission and host galaxy contamination may affect 
the v Fe measurement in some individual cases. A thorough 
analysis of this problem, taking both factors into considera- 
tion, is beyond the scope of this paper. However, the fact that 
v Fe is seen both in the composite spectra and it has a value 
similar to the velocity shift range used to stack the compos- 
ite spectra suggest that the measurement of v Fe in Hu et al. 
(2008b) is reliable for the majority of quasars. The values of 
v Fe are thus suitable for the SPCA study in this paper. In fact, 
Hu et al. (2008b, their §3.3) had already tested the reliability 
of their v Fe measurements for the S/N of SDSS spectra. More- 
over, the influence of He II emission is mitigated by the fact 
that Fe II is fitted over a rather wide wavelength range. 

We do not have a definitive explanation for the contradic- 
tory results obtained by Sulentic et al. (2012). Part of the 
problem may lie in the manner in which they constructed their 
composite spectra, which are defined by parameters of their 
4D Eigenvector 1 formalism, namely H/3 width and Fe ll/H/3 
intensity ratio. Figure 9 of Hu et al. (2008b) shows that these 
parameters are poorly correlated with v Fe . Thus, composite 
spectra generated in bins of these spectral properties are not 
equivalent to those constructed from bins in v Fe . We suspect 
that this may be the reason that Sulentic et al. (2012) failed to 
see the Fe II velocity shifts reported by Hu et al. (2008b). 

B. DERIVATION OF THE EIGENSPECTRA 

This Appendix describe the detailed algorithm for deriving 
the eigenspectra. The algorithm basically follows that in Yip 
et al. (2004a). 

After the three steps of spectra preprocessing described in 
§3, we prepare three 8 16 x 1962 matrices: F, S, and M, where 
F has element as the flux of the z'th mean-subtracted spec- 
trum in the jth wavelength bin, and S and M specify the cor- 
responding flux errors and masks given by the SDSS spectra. 
The goal of SPCA is to calculate the eigenvalues and eigen- 



vectors of the correlation matrix F T ■ F: 

(F r -F)-V = V-(A r -A), (Bl) 

where F T denotes the transpose of matrix F, V is an n x n 
orthogonal matrix whose columns (v^, k = 1 , n) are the eigen- 
vectors of F r • F, A is an n x n diagonal matrix diag(Ai • • • A„), 
and A 2 are the eigenvalues of F r • F. This can be achieved 
using singular value decomposition (SVD): 

F = U-A-V r , (B2) 

where U is an m x n column orthogonal matrix. Defining W = 
U • A, the mean-subtracted spectra can be reconstructed using 
the eigenspectra 

F = W-V r , (B3) 

where W is an m x n matrix, w% is the weight of the kth eigen- 
spectrum for the z'th mean-subtracted spectrum. 

Next, we derive the eigenspectra iteratively, taking the er- 
rors S and masks M into account. 

1. The bad pixels in the mean-subtracted spectra are ini- 
tially corrected by mean interpolation, following Yip et al. 
(2004a). The flux of a bad pixel in any given spectrum is re- 
placed by the mean of the fluxes of all the other spectra in the 
same wavelength bin, which is zero: 

E I E =°> ( B4 > 

i, good pixel ' /'. good pixel 

for i,j that m,y represents bad. The second equation holds 
because F is mean-subtracted. Actually, the choice of inter- 
polation method for the initial bad pixel correction does not 
affect the resultant final eigenspectra. But the mean interpo- 
lation adopted here makes the convergence of eigenspectra 
faster (Yip et al. 2004a) and is quite simple (by just setting 
the flux to zero here). 

2. The corrected flux matrix F c is then factorized in the 
form of Equation (B2) using the algorithm for SVD in chapter 
2.6 of Press et al. (1992). Each column (vj) of the derived 
matrix V is an eigenspectrum; they are arranged in the order 
of decreasing eigenvalues A 2 . 

3. The mean-subtracted spectra are reconstructed using the 
eigenspectra obtained, taking the errors and masks into ac- 
count. For the z'th spectrum, we find the set of (the coef- 
ficient of the kth eigenspectra for the ith spectrum) by mini- 
mizing the quantity 

X 2 = £ f fiJ-E k a ik v jk V (B5) 

j, good pixel ^ 'J / 

This can be solved by Gauss-Jordan elimination (see §2 of 
Connolly & Szalay 1999 for details of the derivation). The 
reconstructed spectrum fj-J c = ^2 k ciikVjk- In practice, only the 
first several tens eigenspectra are needed for reconstructing 
quasar spectra (Yip et al. 2004b; Boroson & Lauer 2010). We 
find that the first 30 are sufficient for the case here (Figure 1 
in §4.1). 

4. The bad pixels in F are corrected again, using the recon- 
structed spectra obtained in the last step: ffj = flj c . Then we 
cycle back to step 2 with the new F c . 

5. The loop in steps 2-4 iterates until the eigenspectra con- 
verge. Following Yip et al. (2004a), the commonality between 
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the eigenspectra derived in the nth iteration (V") and that in 
the (n- l)th iteration (V"" 1 ) is defined as 



Tr(S"-S"-'-S")/t/, 



(B6) 



where Tr is the trace of a matrix, d is the dimension of the two 
sets of eigenspectra for comparison, and S" is the sum of the 
projection operators V", 



(B7) 



k=l 



where \ n k is the kth eigenspectrum derived in the nth iteration. 
This quantity is unity if the two sets of eigenspectra are iden- 
tical, and is zero if they are disjoint. It is not necessary to 
compare the entire set. We compare the first 30 eigenspectra 
for consistency with the reconstruction of the spectrum in step 
3. After five iterations, Tr(S 5 • S 4 • S 5 ) /30 = 0.9997 (see Figure 
1 of Yip et al. 2004a for comparison). Thus, in this work, all 
eigenspectra are obtained with five iterations. 

C. THE FRACTIONAL-CONTRIBUTION 
SPECTRUM 

This paper introduces a new quantity, the fractional- 
contribution spectrum, which is useful for understanding 
eigenspectra. Here we present its definition in detail and com- 
pare it to other quantities widely used in the literature. 

The total variation of the sample is defined as the sum of 
the squares of the differences between the normalized spectra 
and the mean spectrum: 



£4- 

i, j 



(CI) 



It consists of two parts: one that accounts for the noise in the 
original spectra, 

var OT = ^4, (C2) 
j 



and another that describes intrinsic variation, var = var - 
var™. It is more straightforward to express them in percent- 
ages, such that P en = var en /var tot and P int = 1 -P err . 

The proportion of the variation accounted for by the kth 
eigenspectrum can be calculated as 



(C3) 



where \ 2 k is the eigenvalue for the kth eigenspectrum (Mittaz 
et al. 1990). This is the quantity used in previous SPCA stud- 
ies, and it represents the contribution of an eigenspectrum to 
the total variation of the input spectra over the entire wave- 
length range. Thus, the first nth eigenspectra can account for 
YTk=i^k percentage of the total variation; it is called the cu- 
mulative proportion of variation and increases monotonically 
with n. When it reaches P lnt , it indicates that the first n eigen- 
spectra are sufficient for explaining the intrinsic variation in 
the sample, and the remaining higher order eigenspectra con- 
tribute only to noise and can be ignored for our purposes (see 
also §4.1). 

The quantities defined above were widely used in previous 
SPCA studies, but they only refer to the average properties 
of the eigenspectra over the entire wavelength range. In this 
work, we develop these quantities to investigate the eigen- 
spectra in each wavelength bin, as follows. 



o 
o 




T, j vo.rf t p jk / Zjvrf 01 x 100% 



tot 



Figure CI. The sum of pj k , weighted by var*" 1 , versus P k , expressed in per- 
centage, k increases from 1 to 30 from bottom left to top right. The two 
quantities, derived from our SPCA results, are almost equal. This implies 
that the previously used quantity P k can be considered as the varj 0t -weighted 
sum of pjk- The new quantity pj k has the advantage that it contains more 
diagnostic power. 



First, the total and noise variation in the jth wavelength bin 
can be simply defined as 



and 



£.//;• 



: varf/varf 



:£4; 



i- 



(C4) 



(C5) 



Second, after using the first nth eigenspectra for recon- 
struction, the residual variation in the jth wavelength bin is 
the sum of the square of the differences between the recon- 
structed spectra and mean-subtracted spectra in the specific 
wavelength bin: 




(C6) 



The proportion of the variation accounted for by the nth eigen- 
spectrum in the y'th wavelength bin is 



Pin- 



- varf„ s 



var. 



(CI) 



As in the case applied to the entire wavelength range, these 
values are compared with pj 1 to determine the significance of 
each eigenspectrum. 

Thus, we obtain a matrix P, which has the same shape as V, 
whose element pj k represents the proportion of the variation 
accounted for by the kth eigenspectrum in the jth wavelength 
bin. Each column (p^) of P represents the contribution of the 
corresponding eigenspectrum and has the form of spectrum, 
we call it the fractional-contribution spectrum. Actually, as 
shown in Figure CI, the previously used P k can be consid- 
ered as the sum of pjt weighted by var^K The two quantities 
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for the first 30 eigenspectra of our SPCA results are almost 
exactly equal (k increases diagonally to the upper right). 
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